Idioms in Context: The IDIX Corpus
نویسندگان
چکیده
Idioms and other figuratively used expressions pose considerable problems to natural language processing applications because they are very frequent and often behave idiosyncratically. Consequently, there has been much research on the automatic detection and extraction of idiomatic expressions. Most studies focus on type-based idiom detection, i.e., distinguishing whether a given expression can (potentially) be used idiomatically. However, many expressions such as break the ice can have both literal and non-literal readings and need to be disambiguated in a given context (token-based detection). So far relatively few approaches have attempted context-based idiom detection. One reason for this may be that few annotated resources are available that disambiguate expressions in context. With the IDIX corpus, we aim to address this. IDIX is available as an add-on to the BNC and disambiguates different usages of a subset of idioms. We believe that this resource will be useful both for linguistic and computational linguistic studies.
منابع مشابه
The Impact of Context on the learning and Retention of Idioms
The purpose of the present study was to investigate the effect of context on learning idioms among 60 Iranian female advanced English learners. To this end, the researcher assigned the participants to two experimental groups and one control group: Group 1 (first experimental group, the extended-context group), Group 2 (second experimental group, the limited-context group) and Group 3 (control g...
متن کاملConstruction of an Idiom Corpus and its Application to Idiom Identification based on WSD Incorporating Idiom-Specific Features
Some phrases can be interpreted either idiomatically (figuratively) or literally in context, and the precise identification of idioms is indispensable for full-fledged natural language processing (NLP). To this end, we have constructed an idiom corpus for Japanese. This paper reports on the corpus and the results of an idiom identification experiment using the corpus. The corpus targets 146 amb...
متن کاملDrawing a Line between Literal and Idiomatic Meanings Based on Supervised WSD
Hashimoto, Chikara and Kawahara, Daisuke. 2008. Drawing a Line between Literal and Idiomatic Meanings Based on Supervised WSD. Linguistic Research 25(2), 105-123. Some phrases can be interpreted either idiomatically (figuratively) or literally in context, and the precise identification of idioms is indispensable for full-fledged natural language processing (NLP). To this end, we have constructe...
متن کاملFrom etymology to modern phraseology: A corpus-based study of structural variants of Chinese idioms in naturally-occurring contexts
Compared with recent developments in English corpus lexicology and phraseology, the study of Chinese lexicology and phraseology still remains at a level similar to what Fernando has described as quasi-lexicography (Fernando, 1996: 10-11) in the study of English idioms, where explanations provided in idiom dictionaries are rather prescriptive and static than descriptive and dynamic: a typical te...
متن کاملAllophone-based acoustic modeling for Persian phoneme recognition
Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...
متن کامل